Where is the best place in the Boston area for families who like outdoor activities to live?

This analysis uses five indicators to assess ideal locations for individials and families to live who care about having easy access to outdoor activities in the Boston area. The analysis occurs within the Boston MPO boundary and produces the top zip code areas.

We'll be using a tree canopy raster to transform our four other indicators into rasters as well. Below are all five indicators we'll be using.

  1. Tree Canopy Raster from the NLCD
  2. Farmers Markets from MassGIS
  3. Bike trails from MassGIS
  4. Public pools from MassGIS
  5. Blue bikes stations from BlueBikes

Follow the steps below to see how the analysis works and discover where the best places are to live in the Boston area if you care about outdoor activities for families.

First, let's import the packages we'll need to conduct this analysis.

We are going to be using the Boston Region Metropolitan Planning Organization's (MPO) boundaries shapefile to restrict our analysis to just the Boston area. This is a already a shapefile, so that makes this step easy. Follow the steps below to take a look at the data columns, and then to see the data visualized as a map.

Looks good! But we just want the Boston area boundary. We will need to select just the Boston area. While we're at it, we may want to check if we need to reproject to the correct coordinate system. For more on coordinate systems, click this link.

Run the code below to check the coordinate system.

Hmm, according to my research, the best projection for this analysis is EPSG: 6491. So we are going to want to change that as well. Okay, let's select the Boston area and reproject.

Now let's see what that boston area looks like.

Let's also double check we are using the correct coordinate system.

Looks good! Time to bring in our zip codes and reproject again. Once again we have another shapefile to work with.

We want zip code polygons for our analysis, but only for the Boston area. So we'll need to clip the data to just the boston_mpo_area polygon we created above.

Looks good! We have our study area. Now to start pulling in our indicators.

First we're going to pull in our tree canopy raster. This will be pretty important, because we are going to use it to transform our other indicators into rasters themselves.

Below are steps to pull in our raster data and take a quick look at some of its features. We will also name several raster attributes that will be important for our analysis.

Raster data is in the form of an array, as you can see above.

Hm, this tree canopy data is a little confusing! Let's look at the data another way to learn more. We can look at the raster band using a histogram.

Okay! Looks like it is in percentages, so 10 means 10% tree coverage. Don't worry about the peak at 255, that means "no data". Now that we know that we can decide how we rank our tree canopy. The higher the ranking, the better. While having a lot of tree canopy is important, it can be hard to come by in the Boston area! I have decided to give anything above 50% a five, and then slowly give it a less desireable ranking from there.

Min Caonpy Cover(inc) Max Caonpy Cover(exclus) Ranking
... 10% 1
10% 20% 2
20% 30% 3
30% 50% 4
50% ... 5

Time to reclassify this data into rankings.

What does the array look like now?

Let's visualize the reclassified data.

Looks good! Time to pull in some other vector data. First we'll pull in farmers market points.

Let's visualize the farmers market points. It would be better to put them on a basemap so they're aren't floating in space. The contextily package will help with that.

Gotta check that pesky CRS again! What's the CRS of the farmer's market data?

Once again this is not the correct coordinate system for our purposes. Let's reproject and make sure it sticks.

Now time to make our lovely farmers market points into a raster. We're going to do that by creating a binary raster. We will use the shape and transform from our tree canopy raster. To make the binary raster, all the places where there are points will be 0, our default_value, all the places where there are not points will be 1, our fill. See code below.

Alright, so it is a raster, so what? Why do we care about farmers markets? Well, they are a fun outdoor activity for families, but it is not fun to have to travel far to get to them (and not good for the environment!) So we're going to calculate euclidean distance to the farmers markets to find out the areas where your family is closest to a farmers market!

Euclidean distance can be a bit confusing- but we basically just want to to take all our points and then calculate the distance from the point and every other pixel (or cell) on our map. We'll then be able to visually show distance from a certain set of points on a map.

The above array is in number of cells. That isn't really helpful for us to use to come up with our classification scheme. So we're goint to want to change that. To get the distance in spatial units we will need to multiply it by cell size. Good thing we pulled that in when we first pulled in our tree canopy rastier. It is res_tree. We can even find out what that cell size is now.

Okay, let's look at the array again and see what that did.

Hm, a bunch of numbers! Remember when we checked the coordinate system above? That actually had a lot of useful info about this data. If you scroll up, you can see that the data are in meters (or metres). We'll need that info when we reclassify.

For now, let's marvel in our coding skills and look at our euclidean distance raster.

Trippy! Break out the Pink Floyd records! Just kidding, it's time to reclassify our farmers market euclidean distance raster. For this reclassification scheme, I wanted to prioritize walking distance to farmers markets. Half a mile or less would be best, and a mile or less wouldn't be too bad. Above that, we'll probably have to bike, but anything above 5 miles would be a little annoying with all of your vegetable finds. See reclassification scheme below.

Min Distance (Inclusive) Max Distance (Exclusive) Ranking
... 0.5 mile (804.67 m) 5
0.5 mile (804.67 m) 1 miles (1609.34 m) 4
1 miles (1609.34 m) 2.5 miles (4023.36 m) 3
2.5 miles (4023.36 m) 5 miles (8046.72 m) 2
5 miles (8046.72 m) ... 1

Let's look at the reclassified raster.

Great, looks like what we would expect, all the farmers market points are now surrounded by our euclidean distance classification.

On to the next set of vector data we're using for this analysis - bike trails! We're going to follow the same workflow we followed above, for farmers market. Even though these are lines and not points, we can still do it the same way. Keep an eye on those comments to see what we're doing.

Now we are going to make our binary raster. We're going to be doing this a few more times, however, so it might be nice if we wrote a function to make this processes smoother in the future.

We'll use a pretty similar reclassification scheme as we did for farmers markets, but since we won't be carrying heavy things home from the market, we don't need to have as low a threshold for the max distance for a ranking of 5. See classification scheme below.

Min Distance (Inclusive) Max Distance (Exclusive) Ranking
... 1 mile (1609.34 m) 5
1 mile (1609.34 m) 2 miles (3218.69 m) 4
2 miles (3218.69 m) 3 miles (4828.03) 3
3 miles (4828.03) 7 miles (11265.4) 2
7 miles (11265.4) ... 1

Next, we're going to think about DCR pools. Nothing beats a pool day in the summer! But it's no fun to look for parking near the pool. Ideally we'll be able to bike or walk.

Now we can use our function again to make things a little easier.

Time to reclassify again. Like I mentioned above, driving and parking at the pool is no fun. We're prioritizing locations close to a pool, using the scheme below.

Min Distance (Inclusive) Max Distance (Exclusive) Ranking
... 0.5 mile (804.67 m) 5
0.5 mile (804.67 m) 1 miles (1609.34 m) 4
1 miles (1609.34 m) 2.5 miles (4023.36 m) 3
2.5 miles (4023.36 m) 5 miles (8046.72 m) 2
5 miles (8046.72 m) ... 1

On to BlueBikes data. This is a fun one because it is not from MassGIS but is from BlueBikes the company. It is also not in the form of a shapefile, like all of our other data was, it is just a CSV with coordinates on it. That's okay, there's only one extra step to make it into a geodataframe.

Hm, what does this CSV looks like?

Okay, there are our Latitude and Longitude columns. Not too bad. Let's make this into a geodataframe so we can use it for our spatial analysis.

Still gotta check that crs.

Wait what? Nothing happened? Is my computer broken? Nope, it just doesn't have a CRS set yet. We're going to need to set a CRS before we can reproject. We're going to set it to EPSG:4326, because we know this data is in Boston/Cambridge/Brookline, because that is where BlueBikes are.

Now, reproject to our favorite projection, EPSG:6491.

Time to reclassify BlueBike station distance. If you are renting a BlueBike, you probably don't have your bike or car with you. So you wouldn't want to be too far. So we're going to lower that best ranking distance to .25 miles or less.

Min Distance (Inclusive) Max Distance (Exclusive) Ranking
... 0.25 miles (402.336 m) 5
0.25 mile (402.336 m) .5 miles (804.672 m) 4
.5 miles (804.672 m) 2 miles (3218.69 m) 3
2 miles (3218.69 m) 5 miles (8046.72 m) 2
5 miles (8046.72 m) ... 1

Okay! Now we've reclassified everything so it can all be used together. Time to figure out where we should live. Below I have prioritized what I think is most important. I really like biking, so distance to bike trails and BlueBikes stations is most important to me. The other factors are important as well, but less so.

We could just add up all the reclassified rasters. That imagines that everything is equally important.

Then, we can visualize it to see where the best place to live is! Red is best.

That's great. But the diffferent indicators have differing importance levels. So let's try it with a weighting scheme.

Component Weight
Distance to trails 30%
Distance to bluebikes 30%
Tree canopy 18%
Distance to Public Pools 12%
Distance to Farm. Markets 10%

Cool! A colorful blob! This isn't really helpful when looking for a place to live. Let's try to put this information in context.

Remember boston_zips, which we made at the beginning of our analysis? Time to use that. Let's look at that dataframe again.

We are going to use boston_zips to create a mask over the only part of our suitability raster we care about, the Boston area. First we will create another binary raster.

Now anything that is 1 is actually going to be NaN. This will help us make our mask. We can check this by looking at the array again.

Let's look at the mask quickly to make sure it is on the righ track.

Up until now our rasters have not had a legend, which was okay for intermediate steps but is not good for our final product! We also want a title. By defining the below function we will be able to add some title and legend functionality to our final visualizations.

That's good! It would be even better if we had a basemap, so we can emphasize we are in the Boston area of Massachusetts. We can add that using the code below.

The below map shows the zip code boundaries over our suitability raster. But it is still not super easy to know what we are looking at.

We can sort of get an idea where the best places to live would be based on the visualization, but it might be better if we just had an output of the zip codes that are best. To get that, we will use zonal statistics, and more specifically, we will use the mean statistic.

Looks like we have some means, but the indexing is off on this data frame. It needs to be reset before we can use it.

Now we can join our stats to the our zip codes polygon and name a new geodataframe.

Great, mean is now in our geodataframe, so we can map the mean by zip code.

Now if we want to start our house hunt, we'd really need to know the best zip codes in writing. Here is how we would get those.

If we ever have to expand our house hunt outside of our ideal zip codes, we may want to know what zip codes to avoid. So let's find out the five worst according to our analysis too, so we can make sure to never live there.

Let's map just the best!

And now let's map just the worst!

These maps are not super useful. It might be nice if we could zoom and pan. So let's make an interactive map using the folium package that shows our top five zip codes. First we will pull in a blank interactive map:

Next, we will add our zipcodes to the interactive map.

There you have it! The best zip codes to live in according to our indicators. If other indicators are important to you, you can use this workflow to pull in different data, explore a different area, or you can redo the weights to use these indicators prioritizing the indicators in a different order. The possibilities are endless!